Goto

Collaborating Authors

 recursive feature elimination


An explainable Recursive Feature Elimination to detect Advanced Persistent Threats using Random Forest classifier

arXiv.org Artificial Intelligence

V. CONCLUSION This study developed an interpretable Intrusion Detection System (IDS) capable of detecting Advanced Persistent Threats (APTs) with high accuracy. By integrating Recursive Feature Elimination (RFE) and Random Forest (RF), the framework efficiently reduced dimensionality and improved detection performance . SHapley Additive exPlanations (SHAP) was integrated to provide both global and instance - level interpretability, enabling transparent insight into the model's decision - making process. Experimental evaluation demonstrated that the system achieved a detection accuracy of 99.9% and exhibited robust performance . Future work will evaluate the proposed RF - RFE framework in real - time deployment environments, where rapid response is crucial . Deep learning and ensemble - based models, such as Long Short - Term Memory (LSTM) networks can be explored to better capture temporal patterns in evolving cyber threats. These enhancements can improve the system's effectiveness and operational relevance in real - world intrusion detection scenarios. The framework will also be benchmarked against advanced classifiers, including LSTM, XGBoost, and ge nerative AI - based techniques to compare performance in terms of accuracy, interpretability, and adaptability.


ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments

arXiv.org Artificial Intelligence

High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.


HCVR: A Hybrid Approach with Correlation-aware Voting Rules for Feature Selection

arXiv.org Artificial Intelligence

In this paper, we propose HCVR (Hybrid approach with Correlation-aware Voting Rules), a lightweight rule-based feature selection method that combines Parameter-to-Parameter (P2P) and Parameter-to-Target (P2T) correlations to eliminate redundant features and retain relevant ones. This method is a hybrid of non-iterative and iterative filtering approaches for dimensionality reduction. It is a greedy method, which works by backward elimination, eliminating possibly multiple features at every step. The rules contribute to voting for features, and a decision to keep or discard is made by majority voting. The rules make use of correlation thresholds between every pair of features, and between features and the target. We provide the results from the application of HCVR to the SPAMBASE dataset. The results showed improvement performance as compared to traditional non-iterative (CFS, mRMR and MI) and iterative (RFE, SFS and Genetic Algorithm) techniques. The effectiveness was assessed based on the performance of different classifiers after applying filtering.


Development of Interactive Nomograms for Predicting Short-Term Survival in ICU Patients with Aplastic Anemia

arXiv.org Artificial Intelligence

Aplastic anemia is a rare, life-threatening hematologic disorder characterized by pancytopenia and bone marrow failure. ICU admission in these patients often signals critical complications or disease progression, making early risk assessment crucial for clinical decision-making and resource allocation. In this study, we used the MIMIC-IV database to identify ICU patients diagnosed with aplastic anemia and extracted clinical features from five domains: demographics, synthetic indicators, laboratory results, comorbidities, and medications. Over 400 variables were reduced to seven key predictors through machine learning-based feature selection. Logistic regression and Cox regression models were constructed to predict 7-, 14-, and 28-day mortality, and their performance was evaluated using AUROC. External validation was conducted using the eICU Collaborative Research Database to assess model generalizability. Among 1,662 included patients, the logistic regression model demonstrated superior performance, with AUROC values of 0.8227, 0.8311, and 0.8298 for 7-, 14-, and 28-day mortality, respectively, compared to the Cox model. External validation yielded AUROCs of 0.7391, 0.7119, and 0.7093. Interactive nomograms were developed based on the logistic regression model to visually estimate individual patient risk. In conclusion, we identified a concise set of seven predictors, led by APS III, to build validated and generalizable nomograms that accurately estimate short-term mortality in ICU patients with aplastic anemia. These tools may aid clinicians in personalized risk stratification and decision-making at the point of care.


Sentiment-driven prediction of financial returns: a Bayesian-enhanced FinBERT approach

arXiv.org Artificial Intelligence

Predicting financial returns accurately poses a significant challenge due to the inherent uncertainty in financial time series data. Enhancing prediction models' performance hinges on effectively capturing both social and financial sentiment. In this study, we showcase the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model. By meticulously curating an optimal feature set through correlation analysis and employing Bayesian-optimized Recursive Feature Elimination for automatic feature selection, we surpass existing methodologies, achieving an F1-score exceeding 70% on the test set. This success translates into demonstrably higher cumulative profits during backtested trading. Our investigation focuses on real-world SPY ETF data alongside corresponding tweets sourced from the StockTwits platform.


IGRF-RFE: A Hybrid Feature Selection Method for MLP-based Network Intrusion Detection on UNSW-NB15 Dataset

arXiv.org Artificial Intelligence

The effectiveness of machine learning models is significantly affected by the size of the dataset and the quality of features as redundant and irrelevant features can radically degrade the performance. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a Multilayer perceptron (MLP) network. IGRF-RFE can be considered as a feature reduction technique based on both the filter feature selection method and the wrapper feature selection method. In our proposed method, we use the filter feature selection method, which is the combination of Information Gain and Random Forest Importance, to reduce the feature subset search space. Then, we apply recursive feature elimination(RFE) as a wrapper feature selection method to further eliminate redundant features recursively on the reduced feature subsets. Our experimental results obtained based on the UNSW-NB15 dataset confirm that our proposed method can improve the accuracy of anomaly detection while reducing the feature dimension. The results show that the feature dimension is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.


Deploying a Machine learning model as a Chatbot (Part 1)

#artificialintelligence

The Dataset we are going to use is the Loan prediction dataset. The loan prediction dataset is a unique dataset that contains 12 columns. The data was gathered to predict if a customer is eligible for a loan. The Dataset is publicly available on Kaggle and can be accessed using this link. Let's Start with the bottom-up approach and build a simple Machine learning model.


Deploying Supervised Machine Learning Model Using Flask and Docker

#artificialintelligence

Before jumping to Supervised Machine Learning, let's understand a bit about Machine Learning. Machine Learning is an enticing field of study that leverages mathematics to solve complex real-world problems. The traditional algorithms need us to give a set of data and rules to the system and produce answers based on that. The machine learning algorithms take in data and answers and produce rules based on data and answers. The data and answers refer to the features and target of the data and the rules produced refer to the trained model.


Guide To Dimensionality Reduction With Recursive Feature Elimination

#artificialintelligence

Therefore, feature elimination in statistics and machine learning is referred to as choosing a subset of relevant features from the dataset to use in further …


Fibonacci and k-Subsecting Recursive Feature Elimination

arXiv.org Machine Learning

Feature selection is a data mining task with the potential of speeding up classification algorithms, enhancing model comprehensibility, and improving learning accuracy. However, finding a subset of features that is optimal in terms of predictive accuracy is usually computationally intractable. Out of several heuristic approaches to dealing with this problem, the Recursive Feature Elimination (RFE) algorithm has received considerable interest from data mining practitioners. In this paper, we propose two novel algorithms inspired by RFE, called Fibonacci- and k-Subsecting Recursive Feature Elimination, which remove features in logarithmic steps, probing the wrapped classifier more densely for the more promising feature subsets. The proposed algorithms are experimentally compared against RFE on 28 highly multidimensional datasets and evaluated in a practical case study involving 3D electron density maps from the Protein Data Bank. The results show that Fibonacci and k-Subsecting Recursive Feature Elimination are capable of selecting a smaller subset of features much faster than standard RFE, while achieving comparable predictive performance.